我们为在不平衡的短文本数据集中发现稀缺主题提供了一个简单而通用的解决方案,即基于共同发生的网络模型CWIBTD,可以同时解决短文本主题的稀疏和不平衡的问题并减轻效果的效果。偶尔成对的单词出现,使模型更多地集中在发现稀缺主题上。与以前的方法不同,CWIBTD使用共发生的单词网络对每个单词的主题分布进行建模,从而改善了数据空间的语义密度,并确保其在识别稀有主题方面的敏感性,通过改善计算节点活动的方式和正常方式。在某种程度上,稀缺的话题和大主题。此外,使用与LDA相同的Gibbs采样使CWIBTD易于扩展到Viri-OUS应用程序方案。在不夸张的短文本数据集中进行的广泛实验验证证实了CWIBTD在发现稀有主题时的优越性。我们的模型可用于早期,准确地发现社交平台上新兴主题或意外事件。
translated by 谷歌翻译
现有的实例分割方法已经达到了令人印象深刻的表现,但仍遭受了共同的困境:一个实例推断出冗余表示(例如,多个框,网格和锚点),这导致了多个重复的预测。因此,主流方法通常依赖于手工设计的非最大抑制(NMS)后处理步骤来选择最佳预测结果,这会阻碍端到端训练。为了解决此问题,我们建议一个称为Uniinst的无盒和无端机实例分割框架,该框架仅对每个实例产生一个唯一的表示。具体而言,我们设计了一种实例意识到的一对一分配方案,即仅产生一个表示(Oyor),该方案根据预测和地面真相之间的匹配质量,动态地为每个实例动态分配一个独特的表示。然后,一种新颖的预测重新排列策略被优雅地集成到框架中,以解决分类评分和掩盖质量之间的错位,从而使学习的表示形式更具歧视性。借助这些技术,我们的Uniinst,第一个基于FCN的盒子和无NMS实例分段框架,实现竞争性能,例如,使用Resnet-50-FPN和40.2 mask AP使用Resnet-101-FPN,使用Resnet-50-FPN和40.2 mask AP,使用Resnet-101-FPN,对抗AP可可测试-DEV的主流方法。此外,提出的实例感知方法对于遮挡场景是可靠的,在重锁定的ochuman基准上,通过杰出的掩码AP优于公共基线。我们的代码将在出版后提供。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario.
translated by 谷歌翻译
3D对象检测是自动驾驶的重要组成部分,深层神经网络(DNNS)已达到此任务的最新性能。但是,深层模型臭名昭著,因为将高置信度得分分配给分布(OOD)输入,即未从训练分布中得出的输入。检测OOD输入是具有挑战性的,对于模型的安全部署至关重要。已经针对分类任务进行了广泛研究OOD检测,但是它尚未对对象检测任务,特别是基于激光雷达的3D对象检测的注意力。在本文中,我们关注基于激光雷达的3D对象检测的OOD输入的检测。我们制定了OOD输入对于对象检测的含义,并提议适应几种OOD检测方法进行对象检测。我们通过提出的特征提取方法来实现这一目标。为了评估OOD检测方法,我们开发了一种简单但有效的技术,用于为给定的对象检测模型生成OOD对象​​。我们基于KITTI数据集的评估表明,不同的OOD检测方法具有检测特定OOD对象​​的偏差。它强调了联合OOD检测方法的重要性以及在这个方向上进行更多研究。
translated by 谷歌翻译
由于经过验证的2D检测技术的适用性,大多数当前点云检测器都广泛采用了鸟类视图(BEV)。但是,现有方法通过简单地沿高度尺寸折叠的体素或点特征来获得BEV特征,从而导致3D空间信息的重丢失。为了减轻信息丢失,我们提出了一个基于多级特征降低降低策略的新颖点云检测网络,称为MDRNET。在MDRNET中,空间感知的维度降低(SDR)旨在在体素至BEV特征转换过程中动态关注对象的宝贵部分。此外,提出了多级空间残差(MSR),以融合BEV特征图中的多级空间信息。关于Nuscenes的广泛实验表明,该提出的方法的表现优于最新方法。该代码将在出版时提供。
translated by 谷歌翻译
准确可靠的传感器校准对于在自主驾驶中融合激光雷达和惯性测量至关重要。本文提出了一种新型的3D-LIDAR和姿势传感器的新型三阶段外部校准方法,用于自主驾驶。第一阶段可以通过点云表面特征快速校准传感器之间的外部参数,以便可以将外部参数从大的初始误差范围缩小到很小的时间范围。第二阶段可以基于激光映射空间占用率进一步校准外部参数,同时消除运动失真。在最后阶段,校正了由自动驾驶汽车的平面运动引起的Z轴误差,并最终获得了精确的外部参数。具体而言,该方法利用了道路场景的自然特征,使其独立且易于在大规模条件下应用。现实世界数据集的实验结果证明了我们方法的可靠性和准确性。这些代码是在GitHub网站上开源的。据我们所知,这是第一个专门为自动驾驶设计的开源代码,用于校准激光雷达和姿势传感器外部参数。代码链接是https://github.com/opencalib/lidar2ins。
translated by 谷歌翻译
最近,深度加固学习(RL)在机器人操作应用中表现出了一些令人印象深刻的成功。但是,由于样本效率和安全性问题,现实世界中的培训机器人是不平凡的。提出了SIM到现实的转移来解决上述问题,但引入了一个名为“现实差距”的新问题。在这项工作中,我们通过使用单个摄像头的输入来解决上述问题,为基于视觉的组装任务引入SIM模型学习框架,并在模拟环境中进行培训。我们提出了一种基于循环一致的生成对抗网络(CycleGAN)和力量控制转移方法来弥合现实差距的域适应方法。我们证明,在模拟环境中训练有训练的拟议框架可以成功地转移到真实的孔洞设置中。
translated by 谷歌翻译
为了在盲图超级分辨率(SR)上取得有希望的结果,一些尝试利用低分辨率(LR)图像来预测内核并改善SR性能。但是,由于不可用的现实世界模糊内核,这些监督的内核预测(SKP)方法是不切实际的。尽管提出了一些无监督的降解预测(UDP)方法来绕过此问题,但\ textIt {contercestency}之间的降解嵌入和SR功能之间仍然具有挑战性。通过探索降解嵌入与SR功能之间的相关性,我们观察到共同学习内容和降解感知功能是最佳的。基于此观察结果,提出了一个名为CDSR的内容和退化的SR网络。具体而言,CDSR包含三个新建立的模块:(1)将基于重量的编码器(LPE)应用于共同提取内容和降解功能; (2)采用基于域查询的基于注意力的模块(DQA)来适应不一致; (3)基于密码的空格压缩模块(CSC),可以抑制冗余信息。对几个基准测试的广泛实验表明,即使与最先进的SKP方法相比,提议的CDSR的表现都优于现有的UDP模型,并在PSNR和SSIM上实现竞争性能。
translated by 谷歌翻译